Background, Overview, and Motivation:

There is no community that feels the convergence of government policies and in-home culture more than women and children. We are interested in studying the population of mothers and infants due to their unique vulnerability, and their health outcomes which are disproportionately affected by various exposures and government policies. Our group members had interests in both environmental health and perinatal health, so we decided to assess the effect of environmental exposures on maternal and infant health outcomes. We choose to restrict our analysis of these exposures and outcomes to only California due to both data availability constraints, the state’s population and racial diversity, its comprehensive and transparent environmental regulations, and its high use of pesticides. California is the greatest user of pesticides in the US with over 85 million kg applied annually, an amount equivalent to roughly 30% of the cumulative active ingredients applied to US agriculture.

In 1986, California passed “The Safe Drinking Water and Toxic Enforcement Act of 1986” also known as Proposition 65. Proposition 65 requires businesses to provide warnings regarding significant exposures to chemicals that cause cancer, birth defects or other reproductive harm. By requiring that this information be provided, Proposition 65 enables Californians to make informed decisions about their exposures to these chemicals. California has a list of harmful chemicals as characterized by Proposition 65, which is updated at least once a year and includes over 900 chemicals. Proposition 65 has motivated businesses to eliminate or reduce toxic chemicals in numerous consumer products and has led to the safer reformulation of many products. The law has also been successful in educating the general public about exposures to toxic chemicals in consumer products, buildings, and the environment, which as a result created a demand and market reward for less-toxic products.

California is the most populous state in the United States (with roughly 12% of annual births) and is a very diverse state in regards to both demographics and landscape across its various counties, which lends to increased diversity in the state level data. Due to California’s diverse population, we were able to assess exposure to racism (recorded as race) as a potential confounder to maternal health outcomes. The inclusion of analysis and a discussion surrounding race and racism is critical when doing any research in the field of maternal health. The literature has no shortage of evidence pointing to disproportionately adverse health outcomes for Black mothers and babies in America. Infant mortality rates for America’s Black babies are more than twice the rate of white babies and they are more than three times as likely to die from complications related to low birth weight. U.S. maternal mortality rates for Black women are also three to four times higher than rates for white women. For this reason we decided to classify “the experience of racism” as a confounder in our analysis and to stratify by race to account for this confounding.

Initial Questions & Research Question Evolution:

Our initial question was pretty broad: “What is the effect of pesticide use on Maternal and Child Health?” We then narrowed down our scope to include only data from California (inspired by our background research and related work). Over the course of the project, we defined our exposure as pesticide use (continuous variable measured in pounds). We also narrowed down our outcomes of interest to include: fertility, birth weight, and gestational age. We further narrowed our scope of counties to focus on those with high pesticide use and high agricultural activities due to the focus on these areas in the literature we reviewed (mentioned in the “related work” section). The counties we focused on were ranked in our data as the top 4 counties for highest pesticide use and they included: Fresno, Kern, Tulare, and San Joaquin (in that order). We chose to include Los Angeles as a comparison group for the exploratory analysis of the maternal and child health data because it is one of the most populated and most diverse counties in California, and this was of importance to us due to our interest in examining the maternal and infant health outcomes, stratified by race.

Data Sources:

*See notes regarding scraping, cleaning, and wrangling methods at each respective code chunk

Data Source for Pesticides:

Pesticide use for California counties data was retrieved from the California Department of Pesticide Regulation- Pesticide use reporting program https://www.cdpr.ca.gov/docs/pur/purmain.htm

Maternal and Child Health Data Source(s):

Maternal and Child Health outcome data was mainly obtained from the following three sources:

1, California Open Data Portal https://data.ca.gov/dataset/live-births-with-low-and-very-low-birthweight

  1. CHHS Open Data https://data.chhs.ca.gov/dataset/preterm-and-very-preterm-live-births/resource/cff79e2d-6ecf-4158-9e4f-7078632220ee

  2. Centers for Disease Control and Prevention (CDC) Natality Online Database on the Wide-ranging OnLine Data for Epidemiologic Research (WONDER) system Natality, 2007-2019 Request Centers for Disease Control and Prevention (CDC) Natality Online Database on the Wide-ranging OnLine Data for Epidemiologic Research (WONDER) system https://wonder.cdc.gov/natality-current.html

Exploratory Analysis:

Final Analysis:

Load Packages

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyverse)
## -- Attaching packages ----------------------------------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2     v purrr   0.3.4
## v tibble  3.0.3     v stringr 1.4.0
## v tidyr   1.1.2     v forcats 0.5.0
## v readr   1.4.0
## -- Conflicts -------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(pdftools)
## Using poppler version 0.73.0
library(readr)
library(stringr)
library(ggthemes)
library(shiny)
library(shinyBS)
library(RColorBrewer)
library(shinydashboard)
## 
## Attaching package: 'shinydashboard'
## The following object is masked from 'package:graphics':
## 
##     box
library(sp)
library(rgeos)
## rgeos version: 0.5-5, (SVN revision 640)
##  GEOS runtime version: 3.8.0-CAPI-1.13.1 
##  Linking to sp version: 1.4-4 
##  Polygon checking: TRUE
library(rgdal)
## rgdal: version: 1.5-18, (SVN revision 1082)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 3.0.4, released 2020/01/28
## Path to GDAL shared files: C:/Users/zay-z/Documents/R/win-library/4.0/rgdal/gdal
## GDAL binary built with GEOS: TRUE 
## Loaded PROJ runtime: Rel. 6.3.1, February 10th, 2020, [PJ_VERSION: 631]
## Path to PROJ shared files: C:/Users/zay-z/Documents/R/win-library/4.0/rgdal/proj
## Linking to sp version:1.4-4
## To mute warnings of possible GDAL/OSR exportToProj4() degradation,
## use options("rgdal_show_exportToProj4_warnings"="none") before loading rgdal.
library(maptools)
## Checking rgeos availability: TRUE
library(leaflet)
library(scales)
## 
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
## 
##     discard
## The following object is masked from 'package:readr':
## 
##     col_factor
library(maps)
## 
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
## 
##     map
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
library(grid)

MCH Data Wrangling *Low BW Only (for map)

We wanted to create a map to visualize the pattern of low birth weight in California. I initially used CDC WONDER database but it had very limited information, where it grouped rural and small county-level data to “unidentified counties.” Thus, I consulted another data source from California Open Data Portal. Following is the data cleaning process:

#Data Wrangling for Year 2014-2018 Data for Map.  
lbwdata<-read.csv("./low-and-very-low-birthweight-by-county-2014-2018 (1).csv", header = TRUE, stringsAsFactors = FALSE)
lbwdata <- lbwdata %>% mutate(County = str_to_title(County))
lbwdata$Events[is.na(lbwdata$Events)] <- 0
lbwdata <- lbwdata %>% group_by(Year, County, Total.Births) %>% summarize(Events = sum(Events)) 
## `summarise()` regrouping output by 'Year', 'County' (override with `.groups` argument)
lbwdata <- lbwdata %>% filter(!County == "california") 
lbwdata <- lbwdata %>% mutate(Rate = Events/Total.Births)

MCH Data Wrangling *Preterm Birth Only (for map)

We also wanted to create another map to visualize the pattern of preterm birth in California. Again, I ran into a similar problem using CDC WONDER database. Thus, I consulted the CHHS database Following is the data cleaning process:

ptbirthdata<- read.csv("preterm-and-very-preterm-births-by-county-2010-2018-3.csv", header = TRUE, stringsAsFactors = FALSE)
ptbirthdata$Events[is.na(ptbirthdata$Events)] <- 0
ptbirthdata <- ptbirthdata[,-c(7,8)]
ptbirthdata <- ptbirthdata %>% group_by(Year, County, Total.Births) %>% summarize(Events = sum(Events)) 
## `summarise()` regrouping output by 'Year', 'County' (override with `.groups` argument)
ptbirthdata <- ptbirthdata %>% filter(!County == "california")#removing the total count
ptbirthdata <- ptbirthdata %>% mutate(rate_pt = Events/Total.Births * 100)

Code for Creating the Map - Leaflet Map Using LBW Data (See Shiny App for the Final Result)

After cleaning the data set, I then looked at creating a “spatial” map. Following is the wrangling process of generating a map that shapes the map of California, and merging that spatial object with low birth weight data that I wrangled earlier to generate a leaflet map. My main motivation of using a leaflet map was because I wanted to create a map where the user can see which county is which and is able to zoom in and out. Note that there are counties that had NA cases (perhaps for counties that had a very small population).

map <- readOGR(path.expand("cb_2018_us_county_20m.shp"),
               layer = "cb_2018_us_county_20m", stringsAsFactors = FALSE)
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\zay-z\Documents\Harvard Chan\Fall 2020\BST260\datascience-project\Data Prep (& Final RMD)\cb_2018_us_county_20m.shp", layer: "cb_2018_us_county_20m"
## with 3220 features
## It has 9 fields
## Integer64 fields read as strings:  ALAND AWATER
Statekey<-read.csv('./STATEFPtoSTATENAME_Key.csv', colClasses=c('character'))
map<-merge(x=map, y=Statekey, by="STATEFP", all=TRUE)
SingleState <- subset(map, map$STATENAME %in% c(
    "California"
))

lbwdata_2016 <- lbwdata %>% filter(Year == "2016") %>% mutate(Rate = Events/Total.Births*100)
spatial_lbw <-sp::merge(x=SingleState, y=lbwdata_2016, by.x="NAME", by.y="County", by=x)

bins <- c(4.0,6.3,7.6,8.1, Inf)
pal <- colorBin(
    palette = "viridis",
    domain = spatial_lbw$Rate, n=7, bins=bins)

leaflet(spatial_lbw, options = leafletOptions(zoomControl = TRUE, zoomLevelFixed = FALSE, dragging=TRUE, minZoom = 5.3, maxZoom = 9)) %>% 
            setView(lat = 36.778259, lng = -119.417931, zoom = 6) %>%
            addPolygons(color = "Black", weight = 1, smoothFactor = 0.5, 
                        opacity = 1.0, fillOpacity = 0.5, layerId = ~NAME,
                        fillColor = ~pal(Rate), 
                        popup = ~as.factor(paste0("<b><font size=\"4\"><center>County: </b>",spatial_lbw$NAME,"</font></center>","<b>% of Low Birth Weight Births: </b>", sprintf("%1.2f%%", spatial_lbw$Rate),"<br/>"))) %>%
            addLegend(pal = pal, values = spatial_lbw$Rate, opacity = 1, title="% Low Birth Weight (Quartiles)")
## Warning in pal(Rate): Some values were outside the color scale and will be
## treated as NA

Code for Creating the Map - Leaflet Map Using Preterm Birth Data Wrangled Earlier (See Shiny App for the Final Result)

This is a similar spatial map but for preterm birth. Following is the wrangling process of generating a map that shapes the map of California, and merging that spatial object with pre-term birth data that I wrangled earlier to generate a leaflet map. Note that there are counties that had NA cases (perhaps for counties that had a very small population).

map <- readOGR(path.expand("cb_2018_us_county_20m.shp"),
               layer = "cb_2018_us_county_20m", stringsAsFactors = FALSE)
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\zay-z\Documents\Harvard Chan\Fall 2020\BST260\datascience-project\Data Prep (& Final RMD)\cb_2018_us_county_20m.shp", layer: "cb_2018_us_county_20m"
## with 3220 features
## It has 9 fields
## Integer64 fields read as strings:  ALAND AWATER
Statekey<-read.csv('./STATEFPtoSTATENAME_Key.csv', colClasses=c('character'))
map<-merge(x=map, y=Statekey, by="STATEFP", all=TRUE)
SingleState <- subset(map, map$STATENAME %in% c(
    "California"
))

ptbirthdata_2016 <- ptbirthdata %>% filter(Year == "2016") 
spatial_pt <-sp::merge(x=SingleState, y=ptbirthdata_2016, by.x="NAME", by.y="County", by=x)

bin <- c(5.5, 8.2, 9.1, 9.9, Inf)
pal2 <- colorBin(
    palette = "plasma",
    domain = spatial_pt$rate_pt, n=7, bins=bin)

leaflet(spatial_pt, options = leafletOptions(zoomControl = TRUE, zoomLevelFixed = FALSE, dragging=TRUE, minZoom = 5.3, maxZoom = 9)) %>% 
                    setView(lat = 36.778259, lng = -119.417931, zoom = 6) %>%
                    addPolygons(color = "Black", weight = 1, smoothFactor = 0.5, 
                                opacity = 1.0, fillOpacity = 0.5, layerId = ~NAME,
                                fillColor = ~pal2(rate_pt), 
                                popup = ~as.factor(paste0("<b><font size=\"4\"><center>County: </b>",spatial_pt$NAME,"</font></center>","<b>% of Preterm Birth: </b>", sprintf("%1.2f%%", spatial_pt$rate_pt),"<br/>"))) %>% addLegend(pal = pal2, values = spatial_pt$rate_pt, opacity = 1, title="% Preterm Birth (Quartiles)")
## Warning in pal2(rate_pt): Some values were outside the color scale and will be
## treated as NA

Maternal and Child Health Indicators Data Wrangling of CDC Data

The data sets used for the exploratory analysis of MCH indicators by county in California were downloaded from the CDC Wonder Database in “.txt” format. I read the .txt files into the rmd file and turned them into data frames. The first data frame, MCH.CDC.Data had Maternal and Infant Health Outcomes by county over the years, the MCH.CDC.Data_Race had the same variables as the MCH.CDC.Data frame but was stratified by Mother’s Race. I also renamed all the counties in these two data frames to match the same names (re: case and format) as the counties in the pesticide data frames for easier comparison of the variables in these two data frames when comparing by county. Below is the data wrangling and cleaning code for the Maternal and Child Health Data from the CDC Wonder Source.

#Data Wrangling for CDC Data (COMPLETE)
MCH.CDC.Data <- read.delim("NatalityTOTAL.txt",  sep ="\t", dec=".", header = TRUE, stringsAsFactors = FALSE)
MCH.CDC.Data <- MCH.CDC.Data[-c(491:585), ]
MCH.CDC.Data <- MCH.CDC.Data %>% filter(Notes != "Total")
MCH.CDC.Data <- MCH.CDC.Data[ ,-c(1,3,5,7,9)]

MCH.CDC.Data_Race <- read.delim("NatalityRACE.txt",  sep ="\t", dec=".", header = TRUE, stringsAsFactors = FALSE)
MCH.CDC.Data_Race <- MCH.CDC.Data_Race[-c(1794:1931), ]
MCH.CDC.Data_Race <- MCH.CDC.Data_Race[ ,-c(1,3,5,7,9,11)]

#Rename Counties to Match Pesticide Data
MCH.CDC.Data[MCH.CDC.Data$County == "Alameda County, CA", "County"] <-"Alameda"
MCH.CDC.Data[MCH.CDC.Data$County == "Butte County, CA", "County"] <-"Butte"
MCH.CDC.Data[MCH.CDC.Data$County == "Contra Costa County, CA", "County"] <-"Contra Costa"
MCH.CDC.Data[MCH.CDC.Data$County == "El Dorado County, CA", "County"] <-"El Dorado"
MCH.CDC.Data[MCH.CDC.Data$County == "Fresno County, CA", "County"] <-"Fresno"
MCH.CDC.Data[MCH.CDC.Data$County == "Humboldt County, CA", "County"] <-"Humboldt"
MCH.CDC.Data[MCH.CDC.Data$County == "Imperial County, CA", "County"] <-"Imperial"
MCH.CDC.Data[MCH.CDC.Data$County == "Kern County, CA", "County"] <-"Kern"
MCH.CDC.Data[MCH.CDC.Data$County == "Kings County, CA", "County"] <-"Kings"
MCH.CDC.Data[MCH.CDC.Data$County == "Los Angeles County, CA", "County"] <-"Los Angeles"
MCH.CDC.Data[MCH.CDC.Data$County == "Madera County, CA", "County"] <-"Madera"
MCH.CDC.Data[MCH.CDC.Data$County == "Marin County, CA", "County"] <-"Marin"
MCH.CDC.Data[MCH.CDC.Data$County == "Contra Costa County, CA", "County"] <-"Mariposa"
MCH.CDC.Data[MCH.CDC.Data$County == "Merced County, CA", "County"] <-"Merced"
MCH.CDC.Data[MCH.CDC.Data$County == "Monterey County, CA", "County"] <-"Monterey"
MCH.CDC.Data[MCH.CDC.Data$County == "Napa County, CA", "County"] <-"Napa"
MCH.CDC.Data[MCH.CDC.Data$County == "Orange County, CA", "County"] <-"Orange"
MCH.CDC.Data[MCH.CDC.Data$County == "Placer County, CA", "County"] <-"Placer"
MCH.CDC.Data[MCH.CDC.Data$County == "Riverside County, CA", "County"] <-"Riverside"
MCH.CDC.Data[MCH.CDC.Data$County == "Sacramento County, CA", "County"] <-"Sacramento"
MCH.CDC.Data[MCH.CDC.Data$County == "San Bernardino County, CA", "County"] <-"San Bernardino"
MCH.CDC.Data[MCH.CDC.Data$County == "San Diego County, CA", "County"] <-"San Diego"
MCH.CDC.Data[MCH.CDC.Data$County == "San Francisco County, CA", "County"] <-"San Francisco"
MCH.CDC.Data[MCH.CDC.Data$County == "San Joaquin County, CA", "County"] <-"San Joaquin"
MCH.CDC.Data[MCH.CDC.Data$County == "San Luis Obispo County, CA", "County"] <-"San Luis Obispo"
MCH.CDC.Data[MCH.CDC.Data$County == "San Mateo County, CA", "County"] <-"San Mateo"
MCH.CDC.Data[MCH.CDC.Data$County == "Santa Barbara County, CA", "County"] <-"Santa Barbara"
MCH.CDC.Data[MCH.CDC.Data$County == "Santa Clara County, CA", "County"] <-"Canta Clara"
MCH.CDC.Data[MCH.CDC.Data$County == "Santa Cruz County, CA", "County"] <-"Santa Cruz"
MCH.CDC.Data[MCH.CDC.Data$County == "Shasta County, CA", "County"] <-"Shasta"
MCH.CDC.Data[MCH.CDC.Data$County == "Solano County, CA", "County"] <-"Solano"
MCH.CDC.Data[MCH.CDC.Data$County == "Sonoma County, CA", "County"] <-"Sonoma"
MCH.CDC.Data[MCH.CDC.Data$County == "Stanislaus County, CA", "County"] <-"Stanislaus"
MCH.CDC.Data[MCH.CDC.Data$County == "Tulare County, CA", "County"] <-"Tulare"
MCH.CDC.Data[MCH.CDC.Data$County == "Ventura County, CA", "County"] <-"Ventura"
MCH.CDC.Data[MCH.CDC.Data$County == "Yolo County, CA", "County"] <-"Yolo"

MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Alameda County, CA", "County"] <-"Alameda"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Butte County, CA", "County"] <-"Butte"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Contra Costa County, CA", "County"] <-"Contra Costa"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "El Dorado County, CA", "County"] <-"El Dorado"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Fresno County, CA", "County"] <-"Fresno"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Humboldt County, CA", "County"] <-"Humboldt"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Imperial County, CA", "County"] <-"Imperial"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Kern County, CA", "County"] <-"Kern"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Kings County, CA", "County"] <-"Kings"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Los Angeles County, CA", "County"] <-"Los Angeles"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Madera County, CA", "County"] <-"Madera"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Marin County, CA", "County"] <-"Marin"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Contra Costa County, CA", "County"] <-"Mariposa"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Merced County, CA", "County"] <-"Merced"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Monterey County, CA", "County"] <-"Monterey"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Napa County, CA", "County"] <-"Napa"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Orange County, CA", "County"] <-"Orange"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Placer County, CA", "County"] <-"Placer"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Riverside County, CA", "County"] <-"Riverside"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Sacramento County, CA", "County"] <-"Sacramento"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "San Bernardino County, CA", "County"] <-"San Bernardino"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "San Diego County, CA", "County"] <-"San Diego"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "San Francisco County, CA", "County"] <-"San Francisco"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "San Joaquin County, CA", "County"] <-"San Joaquin"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "San Luis Obispo County, CA", "County"] <-"San Luis Obispo"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "San Mateo County, CA", "County"] <-"San Mateo"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Santa Barbara County, CA", "County"] <-"Santa Barbara"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Santa Clara County, CA", "County"] <-"Santa Clara"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Santa Cruz County, CA", "County"] <-"Santa Cruz"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Shasta County, CA", "County"] <-"Shasta"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Solano County, CA", "County"] <-"Solano"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Sonoma County, CA", "County"] <-"Sonoma"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Stanislaus County, CA", "County"] <-"Stanislaus"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Tulare County, CA", "County"] <-"Tulare"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Ventura County, CA", "County"] <-"Ventura"
MCH.CDC.Data_Race[MCH.CDC.Data_Race$County == "Yolo County, CA", "County"] <-"Yolo"

MCH.CDC.Data_Race<- MCH.CDC.Data_Race %>% rename("Mothers.Race" = "Mother.s.Bridged.Race")

Data Key:

Exploratory Data Analysis of Maternal and Infant Health Indicators from the CDC Wonder Database:

Fertility Rate

The first variable I examined was fertility rate and I first visualized the fertility rates across all counties over the years in a tile plot. I then viewed the trend in our counties of interest which include Fresno, Kern, Tulare, San Joaquin (since they are continuously ranked as the top 4 in highest use of pesticides), and Los Angeles (as a control/comparison county) since LA is one of the most populated and most diverse counties in California. Fresno, Kern, Tulare, and San Joaquin counties are also all a part of San Joaquin Valley which we mentioned in our background and related work section to be of special interest to us because it is California’s most productive agricultural region and has one of the highest amounts of pesticide use.

#Fertility Rate GGPLOTS
MCH.CDC.Data %>% 
    ggplot(aes(x = Year, y = County,  fill = Fertility.Rate)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("Fertility Rate", limits = c(30,100),
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Fertility Rate by County") + 
    ylab("") + xlab("")

#Fresno, Ranked #1 in Pesticide Use
MCH.CDC.Data %>% group_by(County) %>% filter(County == "Fresno") %>% ggplot(aes(Year, Fertility.Rate, color = County)) + geom_line() + labs( y="Fertility Rate", title = "Fertility Rate in Fresno County", subtitle = "2007-2019", color = "County", caption = "Data Source: CDC WONDER Online Database") + ylim(30,100)

#Kern, Ranked #2 in Pesticide Use
MCH.CDC.Data %>% group_by(County) %>% filter(County == "Kern") %>% ggplot(aes(Year, Fertility.Rate, color = County)) + geom_line() + labs( y="Fertility Rate", title = "Fertility Rate in Kern County", subtitle = "2007-2019", color = "County", caption = "Data Source: CDC WONDER Online Database") + ylim(30,100)

#Tulare, Ranked #3 in Pesticide Use
MCH.CDC.Data %>% group_by(County) %>% filter(County == "Tulare") %>% ggplot(aes(Year, Fertility.Rate, color = County)) + geom_line() + labs( y="Fertility Rate", title = "Fertility Rate in Tulare County", subtitle = "2007-2019", color = "County", caption = "Data Source: CDC WONDER Online Database") + ylim(30,100)

#San Joaquin, Ranked #4 in Pesticide Use
MCH.CDC.Data %>% group_by(County) %>% filter(County == "San Joaquin") %>% ggplot(aes(Year, Fertility.Rate, color = County)) + geom_line() + labs( y="Fertility Rate", title = "Fertility Rate in San Joaquin County", subtitle = "2007-2019", color = "County", caption = "Data Source: CDC WONDER Online Database") + ylim(30,100)

#Los Angeles, Comparison Group 
MCH.CDC.Data %>% group_by(County) %>% filter(County == "Los Angeles") %>% ggplot(aes(Year, Fertility.Rate, color = County)) + geom_line() + labs( y="Fertility Rate", title = "Fertility Rate in Los Angeles County", subtitle = "2007-2019", color = "County", caption = "Data Source: CDC WONDER Online Database") + ylim(30,100)

#Grid Plots
p1 <- MCH.CDC.Data %>% group_by(County) %>% filter(County == "Fresno") %>% ggplot(aes(Year, Fertility.Rate, color = County)) + geom_line() + labs( y="Fertility Rate", title = "Fertility Rate in Fresno County", subtitle = "2007-2019", color = "County") + theme(legend.position = "none") + ylim(30,100)

p2 <- MCH.CDC.Data %>% group_by(County) %>% filter(County == "Kern") %>% ggplot(aes(Year, Fertility.Rate, color = County)) + geom_line() + labs( y="Fertility Rate", title = "Fertility Rate in Kern County", subtitle = "2007-2019", color = "County") + theme(legend.position = "none") + ylim(30,100)

p3 <-MCH.CDC.Data %>% group_by(County) %>% filter(County == "Tulare") %>% ggplot(aes(Year, Fertility.Rate, color = County)) + geom_line() + labs( y="Fertility Rate", title = "Fertility Rate in Tulare County", subtitle = "2007-2019", color = "County") + theme(legend.position = "none") + ylim(30,100)

p4 <-MCH.CDC.Data %>% group_by(County) %>% filter(County == "Los Angeles") %>% ggplot(aes(Year, Fertility.Rate, color = County)) + geom_line() + labs( y="Fertility Rate", title = "Fertility Rate in Los Angeles County", subtitle = "2007-2019", color = "County") + theme(legend.position = "none") + ylim(30,100)

grid.arrange(p1, p2, p3, p4, bottom = "Data Source: CDC WONDER Online Database")

Racial Demographics

The next variable I examined was the racial demographics and I did so by stratifying the MCH.CDC.Data by race (via the MCH.CDC.Data_Race data frame) and viewing the total population of each race across all counties in the tile plots to assess if there was any one county with a more dense population of a certain race. I then viewed the racial demographic trends in our counties of interest which include Fresno, Kern, Tulare, San Joaquin (since they are continuously ranked as the top 4 in highest use of pesticides), and Los Angeles (as a control/comparison county) since LA is one of the most populated and most diverse counties in California. Fresno, Kern, Tulare, and San Joaquin counties are also all a part of San Joaquin Valley which we mentioned in our background and related work section to be of special interest to us because it is California’s most productive agricultural region and has one of the highest amounts of pesticide use.

#Racial Demographic GGPLOTS
MCH.CDC.Data_Race %>% filter(Mothers.Race == "American Indian or Alaska Native") %>%
    ggplot(aes(x = Year, y = County,  fill = Total.Population)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("American Indian or Alaska Native Population",
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Racial Demographics by County") + 
    ylab("") + xlab("") 

MCH.CDC.Data_Race %>% filter(Mothers.Race == "Asian or Pacific Islander") %>%
    ggplot(aes(x = Year, y = County,  fill = Total.Population)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("Asian or Pacific Islander", 
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Racial Demographics by County") + 
    ylab("") + xlab("")

MCH.CDC.Data_Race %>% filter(Mothers.Race == "Black or African American") %>%
    ggplot(aes(x = Year, y = County,  fill = Total.Population)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("Black or African American Population",
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Racial Demographics by County") + 
    ylab("") + xlab("")

MCH.CDC.Data_Race %>% filter(Mothers.Race == "White") %>%
    ggplot(aes(x = Year, y = County,  fill = Total.Population)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("White Population",
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Racial Demographics by County") + 
    ylab("") + xlab("")

#Fresno, Ranked #1 in Pesticide Use
MCH.CDC.Data_Race %>% group_by(County) %>% filter(County == "Fresno") %>% ggplot(aes(Year, Total.Population, color = Mothers.Race)) + geom_line() + labs( y="Total Population (Log 10 Transformation)", title = "Racial Demographics in Fresno County (Transformed Y Axis)", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + scale_y_log10()

#Kern, Ranked #2 in Pesticide Use
MCH.CDC.Data_Race %>% group_by(County) %>% filter(County == "Kern") %>%  ggplot(aes(Year, Total.Population, color = Mothers.Race)) + geom_line() + labs( y="Total Population (Log 10 Transformation)", title = "Racial Demographics in Kern County (Transformed Y Axis)", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + scale_y_log10()

#Tulare, Ranked #3 in Pesticide Use
MCH.CDC.Data_Race %>% group_by(County) %>% filter(County == "Tulare") %>%  ggplot(aes(Year, Total.Population, color = Mothers.Race)) + geom_line() +  labs( y="Total Population (Log 10 Transformation)", title = "Racial Demographics in Tulare County (Transformed Y Axis)", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + scale_y_log10()

#San Joaquin, Ranked #4 in Pesticide Use
MCH.CDC.Data_Race %>% group_by(County) %>% filter(County == "San Joaquin") %>%  ggplot(aes(Year, Total.Population, color = Mothers.Race)) + geom_line()  + labs( y="Total Population (Log 10 Transformation)", title = "Racial Demographics in San Joaquin County (Transformed Y Axis)", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + scale_y_log10()

#Los Angeles, Comparison Group
MCH.CDC.Data_Race %>% group_by(County) %>% filter(County == "Los Angeles") %>%  ggplot(aes(Year, Total.Population, color = Mothers.Race)) + geom_line() + labs( y="Total Population (Log 10 Transformation)", title = "Racial Demographics in Los Angeles County (Transformed Y Axis)", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + scale_y_log10()

##### Fertility Rate (Stratified By Race) The next variable I examined was once again fertility rate but this time stratified by race. I first visualized the fertility rates across all counties over the years by each race in a tile plot. I then viewed the trends of fertility rates by race in our counties of interest which include Fresno, Kern, Tulare, San Joaquin (since they are continuously ranked as the top 4 in highest use of pesticides), and Los Angeles (as a control/comparison county) since LA is one of the most populated and most diverse counties in California. Fresno, Kern, Tulare, and San Joaquin counties are also all a part of San Joaquin Valley which we mentioned in our background and related work section to be of special interest to us because it is California’s most productive agricultural region and has one of the highest amounts of pesticide use.

#Race and Fertility GGPLOTS
MCH.CDC.Data_Race %>% filter(Mothers.Race == "American Indian or Alaska Native") %>%
ggplot(aes(x = Year, y = County,  fill = Fertility.Rate)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("Fertility Rate", limits = c(0,115),
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Fertility Rate by County for American Indian or Alaska Native Pop.") + 
    ylab("") + xlab("") 

MCH.CDC.Data_Race %>% filter(Mothers.Race == "Asian or Pacific Islander") %>%
ggplot(aes(x = Year, y = County,  fill = Fertility.Rate)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("Fertility Rate",limits = c(0,115),
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Fertility Rate by County for Asian or Pacific Islander Pop.") + 
    ylab("") + xlab("") 

MCH.CDC.Data_Race %>% filter(Mothers.Race == "Black or African American") %>%
ggplot(aes(x = Year, y = County,  fill = Fertility.Rate)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("Fertility Rate", limits = c(0,115),
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Fertility Rate by County for Black or African American Pop.") + 
    ylab("") + xlab("") 

MCH.CDC.Data_Race %>% filter(Mothers.Race == "White") %>%
ggplot(aes(x = Year, y = County,  fill = Fertility.Rate)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("Fertility Rate", limits = c(0,115),
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Fertility Rate by County for White Pop.") + 
    ylab("") + xlab("") 

#Fresno, Ranked #1 in Pesticide Use
MCH.CDC.Data_Race %>% filter(County == "Fresno") %>% ggplot(aes(Year, Fertility.Rate, color = Mothers.Race)) + geom_line() + labs( y="Fertility Rate", title = "Race and Fertility Data in Fresno County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + ylim(0,115)

#Kern, Ranked #2 in Pesticide Use
MCH.CDC.Data_Race %>% filter(County == "Kern") %>% ggplot(aes(Year, Fertility.Rate, color = Mothers.Race)) + geom_line() + labs( y="Fertility Rate", title = "Race and Fertility Data in Kern County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + ylim(0,115)

#Tulare, Ranked #3 in Pesticide Use
MCH.CDC.Data_Race %>% filter(County == "Tulare") %>% ggplot(aes(Year, Fertility.Rate, color = Mothers.Race)) + geom_line() + labs( y="Fertility Rate", title = "Race and Fertility Data in Tulare County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + ylim(0,115)

#San Joaquin, Ranked #4 in Pesticide Use
MCH.CDC.Data_Race %>% filter(County == "San Joaquin") %>% ggplot(aes(Year, Fertility.Rate, color = Mothers.Race)) + geom_line() + labs( y="Fertility Rate", title = "Race and Fertility Data in San Joaquin County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + ylim(0,115)

#Los Angeles, Comparison 
MCH.CDC.Data_Race %>% filter(County == "Los Angeles") %>% ggplot(aes(Year, Fertility.Rate, color = Mothers.Race)) + geom_line() + labs( y="Fertility Rate", title = "Race and Fertility Data in Los Angeles County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + ylim(0,115)

Preterm Birth (Stratified By Race)

The next variable I examined was preterm birth measured as LMP (last menstrual period) gestation age in weeks, and I added stratification by race. I first visualized the preterm birth across all counties over the years and then stratified the tile plots by race. I then viewed the trends of gestational age by race in our counties of interest which include Fresno, Kern, Tulare, San Joaquin (since they are continuously ranked as the top 4 in highest use of pesticides), and Los Angeles (as a control/comparison county) since LA is one of the most populated and most diverse counties in California. Fresno, Kern, Tulare, and San Joaquin counties are also all a part of San Joaquin Valley which we mentioned in our background and related work section to be of special interest to us because it is California’s most productive agricultural region and has one of the highest amounts of pesticide use. I also included a horizontal line for the county level stratified data at 37 weeks which is the cutoff for defining preterm birth.

#Preterm Birth GGPLOTS (With Race)
MCH.CDC.Data %>% 
    ggplot(aes(x = Year, y = County,  fill = Average.LMP.Gestational.Age)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("Average Gestational Age (LMP) in Weeks", limits = c(36,40),
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Average Gestational Age by County") + 
    ylab("") + xlab("") 

MCH.CDC.Data_Race %>% filter(Mothers.Race == "American Indian or Alaska Native") %>%
ggplot(aes(x = Year, y = County,  fill = Average.LMP.Gestational.Age)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("Average Gestational Age (LMP) in Weeks", limits = c(36,40),
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Average Gestational Age by County for American Indian or Alaska Native Pop.") + 
    ylab("") + xlab("") 

MCH.CDC.Data_Race %>% filter(Mothers.Race == "Asian or Pacific Islander") %>%
ggplot(aes(x = Year, y = County,  fill = Average.LMP.Gestational.Age)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("Average Gestational Age (LMP) in Weeks", limits = c(36,40),
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Average Gestational Age by County for Asian or Pacific Islander Pop.") + 
    ylab("") + xlab("") 

MCH.CDC.Data_Race %>% filter(Mothers.Race == "Black or African American") %>%
ggplot(aes(x = Year, y = County,  fill = Average.LMP.Gestational.Age)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("Average Gestational Age (LMP) in Weeks", limits = c(36,40),
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Average Gestational Age by County for Black or African American Pop.") + 
    ylab("") + xlab("") 

MCH.CDC.Data_Race %>% filter(Mothers.Race == "White") %>%
ggplot(aes(x = Year, y = County,  fill = Average.LMP.Gestational.Age)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("Average Gestational Age (LMP) in Weeks", limits = c(36,40),
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Average Gestational Age by County for White Pop.") + 
    ylab("") + xlab("") 

#Fresno, Ranked #1 in Pesticide Use
MCH.CDC.Data_Race %>% filter(County == "Fresno") %>% ggplot(aes(Year, Average.LMP.Gestational.Age, color = Mothers.Race)) + geom_line() + labs( y="Average LMP Gestational Age (Weeks)", title = "Preterm Birth in Fresno County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + geom_hline(yintercept = 37, size =2) + ylim(36,40)

#Kern, Ranked #2 in Pesticide Use
MCH.CDC.Data_Race %>% filter(County == "Kern") %>% ggplot(aes(Year, Average.LMP.Gestational.Age, color = Mothers.Race)) + geom_line() + labs( y="Average LMP Gestational Age (Weeks)", title = "Preterm Birth in Kern County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database")  + geom_hline(yintercept = 37, size =2) + ylim(36,40)

#Tulare, Ranked #3 in Pesticide Use
MCH.CDC.Data_Race %>% filter(County == "Tulare") %>% ggplot(aes(Year, Average.LMP.Gestational.Age, color = Mothers.Race)) + geom_line() + labs( y="Average LMP Gestational Age (Weeks)", title = "Preterm Birth in Tulare", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database")  + geom_hline(yintercept = 37, size =2) + ylim(36,40)

#San Joaquin, Ranked #4 in Pesticide Use
MCH.CDC.Data_Race %>% filter(County == "San Joaquin") %>% ggplot(aes(Year, Average.LMP.Gestational.Age, color = Mothers.Race)) + geom_line() + labs( y="Average LMP Gestational Age (Weeks)", title = "Preterm Birth in San Joaquin County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + geom_hline(yintercept = 37, size =2) + ylim(36,40)

#Los Angeles County, Comparison Group
MCH.CDC.Data_Race %>% filter(County == "Los Angeles") %>% ggplot(aes(Year, Average.LMP.Gestational.Age, color = Mothers.Race)) + geom_line() + labs( y="Average LMP Gestational Age (Weeks)", title = "Preterm Birth in Los Angeles County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database")  + geom_hline(yintercept = 37, size =2) + ylim(36,40)

#Preterm Birth Cutoff is 37 Weeks (Horizontal Line)
Birth Weight (Stratified By Race)

The next variable I examined was birth weight measured in grams, and I added stratification by race. I first visualized the birth weight across all counties over the years and then stratified the tile plots by race. I then viewed the trends of birth weight by race in our counties of interest which include Fresno, Kern, Tulare, San Joaquin (since they are continuously ranked as the top 4 in highest use of pesticides), and Los Angeles (as a control/comparison county) since LA is one of the most populated and most diverse counties in California. Fresno, Kern, Tulare, and San Joaquin counties are also all a part of San Joaquin Valley which we mentioned in our background and related work section to be of special interest to us because it is California’s most productive agricultural region and has one of the highest amounts of pesticide use. I also included a horizontal line for the county level stratified data at 25000 grams which is the cutoff for defining low birth weight.

#Birth weight GGPLOTS (With Race)
MCH.CDC.Data %>% 
    ggplot(aes(x = Year, y = County,  fill = Average.Birth.Weight)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("Average Birth Weight in Grams", limits = c(2400, 3600),
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Average Birth Weight by County") + 
    ylab("") + xlab("") 

MCH.CDC.Data_Race %>% filter(Mothers.Race == "American Indian or Alaska Native") %>%
ggplot(aes(x = Year, y = County,  fill = Average.Birth.Weight)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("Average Birth Weight in Grams", limits = c(2400, 3600),
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Average Birth Weight by County for American Indian or Alaska Native Pop.") + 
    ylab("") + xlab("") 

MCH.CDC.Data_Race %>% filter(Mothers.Race == "Asian or Pacific Islander") %>%
ggplot(aes(x = Year, y = County,  fill = Average.Birth.Weight)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("Average Birth Weight in Grams", limits = c(2400, 3600),
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Average Birth Weight by County for Asian or Pacific Islander Pop.") + 
    ylab("") + xlab("") 

MCH.CDC.Data_Race %>% filter(Mothers.Race == "Black or African American") %>%
ggplot(aes(x = Year, y = County,  fill = Average.Birth.Weight)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("Average Birth Weight Grams", limits = c(2400, 3600),
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Average Birth Weight by County for Black or African American Pop.") + 
    ylab("") + xlab("") 

MCH.CDC.Data_Race %>% filter(Mothers.Race == "White") %>%
ggplot(aes(x = Year, y = County,  fill = Average.Birth.Weight)) +
    geom_tile(color = "grey50") +
    scale_x_continuous(expand = c(0,0)) +
    scale_fill_gradientn("Average Birth Weight in Grams", limits = c(2400, 3600),
                         colors = brewer.pal(9, "Reds")) +
    theme_minimal() +  
    theme(panel.grid = element_blank()) +
    ggtitle("Average Birth Weight by County for White Pop.") + 
    ylab("") + xlab("") 

#Fresno, Ranked #1 in Pesticide Use
MCH.CDC.Data_Race %>% filter(County == "Fresno") %>% ggplot(aes(Year, Average.Birth.Weight, color = Mothers.Race)) + geom_line() + labs( y="Average Birth Weight (Grams)", title = "Birth Weight Data in Fresno County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + geom_hline(yintercept = 2500, size =2)+ ylim(2400, 3600)

#Kern, Ranked #2 in Pesticide Use
MCH.CDC.Data_Race %>% filter(County == "Kern") %>% ggplot(aes(Year, Average.Birth.Weight, color = Mothers.Race)) + geom_line() + labs( y="Average Birth Weight (Grams)", title = "Birth Weight Data in Kern County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database")  + geom_hline(yintercept = 2500, size =2)+ ylim(2400, 3600)

#Tulare, Ranked #3 in Pesticide Use
MCH.CDC.Data_Race %>% filter(County == "Tulare") %>% ggplot(aes(Year, Average.Birth.Weight, color = Mothers.Race)) + geom_line() + labs( y="Average Birth Weight (Grams)", title = "Birth Weight Data in Tulare", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database")  + geom_hline(yintercept = 2500, size =2)+ ylim(2400, 3600)

#San Joaquin, Ranked #4 in Pesticide Use
MCH.CDC.Data_Race %>% filter(County == "San Joaquin") %>% ggplot(aes(Year, Average.Birth.Weight, color = Mothers.Race)) + geom_line() + labs( y="Average Birth Weight (Grams)", title = "Birth Weight Data in San Joaquin County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database") + geom_hline(yintercept = 2500, size =2)+ ylim(2400, 3600)

#Los Angeles, Comparison Group
MCH.CDC.Data_Race %>% filter(County == "Los Angeles") %>% ggplot(aes(Year, Average.Birth.Weight, color = Mothers.Race)) + geom_line() + labs( y="Average Birth Weight (Grams)", title = "Birth Weight Data in Los Angeles County", subtitle = "2007-2019", color = "Mother's Race", caption = "Data Source: CDC WONDER Online Database")  + geom_hline(yintercept = 2500, size =2) + ylim(2400, 3600)

#LBW Cutoff is 2500 Grams (Horizontal Line)

####Pesticide Data Wrangling

The

county_ranks16 <- read_delim("table1_county_rank_2016.txt", "\t", escape_double = FALSE, trim_ws = TRUE)
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   COUNTY = col_character(),
##   LBS_2015 = col_double(),
##   RANK_2015 = col_double(),
##   LBS_2016 = col_double(),
##   RANK_2016 = col_double()
## )
repro_lbs16 <- read_delim("table3_reproductive_lbs_2016.txt", "\t", escape_double = FALSE, trim_ws = TRUE)
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   CHEMICAL = col_character(),
##   LBS_2007 = col_double(),
##   LBS_2008 = col_double(),
##   LBS_2009 = col_double(),
##   LBS_2010 = col_double(),
##   LBS_2011 = col_double(),
##   LBS_2012 = col_double(),
##   LBS_2013 = col_double(),
##   LBS_2014 = col_double(),
##   LBS_2015 = col_double(),
##   LBS_2016 = col_double()
## )
repro_acre16 <- read_delim("table4_reproductive_acres_2016.txt", "\t", escape_double = FALSE, trim_ws = TRUE)
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   CHEMNAME = col_character(),
##   ACRES_2007 = col_double(),
##   ACRES_2008 = col_double(),
##   ACRES_2009 = col_double(),
##   ACRES_2010 = col_double(),
##   ACRES_2011 = col_double(),
##   ACRES_2012 = col_double(),
##   ACRES_2013 = col_double(),
##   ACRES_2014 = col_double(),
##   ACRES_2015 = col_double(),
##   ACRES_2016 = col_double()
## )
table1_2016 <- county_ranks16 %>% transmute(county = COUNTY, 
                                     lbs_2015 = LBS_2015, rank_2015 = RANK_2015, 
                                     lbs_2016 = LBS_2016, rank_2016 = RANK_2016)
# column 1 is the county
# columns 2-3 have the previous year data
# columns 4-5 have the current year data

# we only want columns 1-3 for the most up-to-date data for all years before 2016
all_dat <- list(read_csv("table1_2007.csv")[1:3],
                read_csv("table1_2008.csv")[1:3],
                read_csv("table1_2009.csv")[1:3],
                read_csv("table1_2010.csv")[1:3],
                read_csv("table1_2011.csv")[1:3],
                read_csv("table1_2012.csv")[1:3],
                read_csv("table1_2013.csv")[1:3],
                read_csv("table1_2014.csv")[1:3],
                read_csv("table1_2015.csv")[1:3])
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   county = col_character(),
##   lbs_2006 = col_double(),
##   rank_2006 = col_double(),
##   lbs_2007 = col_double(),
##   rank_2007 = col_double()
## )
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   county = col_character(),
##   lbs_2007 = col_double(),
##   rank_2007 = col_double(),
##   lbs_2008 = col_double(),
##   rank_2008 = col_double()
## )
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   county = col_character(),
##   lbs_2008 = col_double(),
##   rank_2008 = col_double(),
##   lbs_2009 = col_double(),
##   rank_2009 = col_double()
## )
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   county = col_character(),
##   lbs_2009 = col_double(),
##   rank_2009 = col_double(),
##   lbs_2010 = col_double(),
##   rank_2010 = col_double()
## )
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   county = col_character(),
##   lbs_2010 = col_double(),
##   rank_2010 = col_double(),
##   lbs_2011 = col_double(),
##   rank_2011 = col_double()
## )
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   county = col_character(),
##   lbs_2011 = col_double(),
##   rank_2011 = col_double(),
##   lbs_2012 = col_double(),
##   rank_2012 = col_double()
## )
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   county = col_character(),
##   lbs_2012 = col_double(),
##   rank_2012 = col_double(),
##   lbs_2013 = col_double(),
##   rank_2013 = col_double()
## )
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   county = col_character(),
##   lbs_2013 = col_double(),
##   rank_2013 = col_double(),
##   lbs_2014 = col_double(),
##   rank_2014 = col_double()
## )
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   county = col_character(),
##   lbs_2014 = col_double(),
##   rank_2014 = col_double(),
##   lbs_2015 = col_double(),
##   rank_2015 = col_double()
## )
table1 <- Reduce(function(x, y) left_join(x, y, by = "county"), all_dat)


long_table1 <- table1 %>% pivot_longer(!county, names_to = 'usage', values_to = "value")
#table1_ranks <- long_table1 %>% filter(str_starts(usage, "rank"))
table1_lbs <- long_table1 %>% filter(str_starts(usage, "lbs"))

long_table2 <- table1_2016 %>% pivot_longer(!county, names_to = 'usage', values_to = "value")
#table1_ranks_1516 <- long_table2 %>% filter(str_starts(usage, "rank"))
table1_lbs_1516<- long_table2 %>% filter(str_starts(usage, "lbs"))

table1_lbs$usage <- as.numeric(gsub("[^[:digit:]]+", "", table1_lbs$usage))
table1_lbs_1516$usage <- as.numeric(gsub("[^[:digit:]]+", "", table1_lbs_1516$usage))
combined_pesticide_use <- table1_lbs %>% full_join(table1_lbs_1516) 
## Joining, by = c("county", "usage", "value")
class(combined_pesticide_use$usage) 
## [1] "numeric"
combined_pesticide_use <- combined_pesticide_use %>% group_by(usage) 
combined_pesticide_use <- combined_pesticide_use %>% arrange(usage)

LBW Data from CDC (Wrangling for use in ShinyApp):

This is my data wrangling process for low birth weight for the CDC WONDER database. By default, CDC WONDER live birth database only displayed counties that had a county population >100,000. I only looked at low birth rate here and this is for my shiny app bar graph.

#MCH_CDC Data for low birth weight 
#data wrangling mch cdc data
cdc_lowbirthweight <- read.delim("MCH CDC Data.txt",  sep ="\t", dec=".", header = TRUE, stringsAsFactors = FALSE)
cdc_lowbirthweight  <- cdc_lowbirthweight [-c(482:538), ]
cdc_lowbirthweight  <- cdc_lowbirthweight [ ,-c(1, 3, 5, 7)]
MCH.CDC.Data.Total <- read.delim("MCH CDC Data Total.txt",  sep ="\t", dec=".", header = TRUE, stringsAsFactors = FALSE)
MCH.CDC.Data.Total <- MCH.CDC.Data.Total[,-c(1, 3, 5)]
MCH.CDC.Data.Total %>% rename("Total Birth" = "Births")
##     Year                     County Total Birth
## 1   2007         Alameda County, CA       21522
## 2   2007           Butte County, CA        2523
## 3   2007    Contra Costa County, CA       13487
## 4   2007       El Dorado County, CA        1882
## 5   2007          Fresno County, CA       17292
## 6   2007        Humboldt County, CA        1599
## 7   2007        Imperial County, CA        3146
## 8   2007            Kern County, CA       15336
## 9   2007           Kings County, CA        2781
## 10  2007     Los Angeles County, CA      151908
## 11  2007          Madera County, CA        2612
## 12  2007           Marin County, CA        2820
## 13  2007          Merced County, CA        4652
## 14  2007        Monterey County, CA        7551
## 15  2007            Napa County, CA        1665
## 16  2007          Orange County, CA       44038
## 17  2007          Placer County, CA        4054
## 18  2007       Riverside County, CA       34563
## 19  2007      Sacramento County, CA       22119
## 20  2007  San Bernardino County, CA       35190
## 21  2007       San Diego County, CA       47569
## 22  2007   San Francisco County, CA        9129
## 23  2007     San Joaquin County, CA       11600
## 24  2007 San Luis Obispo County, CA        2884
## 25  2007       San Mateo County, CA        9914
## 26  2007   Santa Barbara County, CA        6292
## 27  2007     Santa Clara County, CA       27490
## 28  2007      Santa Cruz County, CA        3571
## 29  2007          Shasta County, CA        2230
## 30  2007          Solano County, CA        5849
## 31  2007          Sonoma County, CA        5742
## 32  2007      Stanislaus County, CA        8827
## 33  2007          Tulare County, CA        8507
## 34  2007         Ventura County, CA       12198
## 35  2007            Yolo County, CA        2522
## 36  2007  Unidentified Counties, CA       11350
## 37  2007                                 566414
## 38  2008         Alameda County, CA       20976
## 39  2008           Butte County, CA        2520
## 40  2008    Contra Costa County, CA       13135
## 41  2008       El Dorado County, CA        1814
## 42  2008          Fresno County, CA       16764
## 43  2008        Humboldt County, CA        1601
## 44  2008        Imperial County, CA        3241
## 45  2008            Kern County, CA       15316
## 46  2008           Kings County, CA        2711
## 47  2008     Los Angeles County, CA      147745
## 48  2008          Madera County, CA        2535
## 49  2008           Marin County, CA        2719
## 50  2008          Merced County, CA        4422
## 51  2008        Monterey County, CA        7435
## 52  2008            Napa County, CA        1671
## 53  2008          Orange County, CA       42467
## 54  2008          Placer County, CA        4035
## 55  2008       Riverside County, CA       32881
## 56  2008      Sacramento County, CA       21397
## 57  2008  San Bernardino County, CA       33837
## 58  2008       San Diego County, CA       46755
## 59  2008   San Francisco County, CA        9106
## 60  2008     San Joaquin County, CA       11030
## 61  2008 San Luis Obispo County, CA        2739
## 62  2008       San Mateo County, CA        9770
## 63  2008   Santa Barbara County, CA        6320
## 64  2008     Santa Clara County, CA       26731
## 65  2008      Santa Cruz County, CA        3537
## 66  2008          Shasta County, CA        2186
## 67  2008          Solano County, CA        5609
## 68  2008          Sonoma County, CA        5763
## 69  2008      Stanislaus County, CA        8550
## 70  2008          Tulare County, CA        8535
## 71  2008         Ventura County, CA       12075
## 72  2008            Yolo County, CA        2669
## 73  2008  Unidentified Counties, CA       11182
## 74  2008                                 551779
## 75  2009         Alameda County, CA       20325
## 76  2009           Butte County, CA        2440
## 77  2009    Contra Costa County, CA       12686
## 78  2009       El Dorado County, CA        1727
## 79  2009          Fresno County, CA       16271
## 80  2009        Humboldt County, CA        1541
## 81  2009        Imperial County, CA        3151
## 82  2009            Kern County, CA       14828
## 83  2009           Kings County, CA        2645
## 84  2009     Los Angeles County, CA      139757
## 85  2009          Madera County, CA        2390
## 86  2009           Marin County, CA        2496
## 87  2009          Merced County, CA        4407
## 88  2009        Monterey County, CA        7070
## 89  2009            Napa County, CA        1653
## 90  2009          Orange County, CA       40437
## 91  2009          Placer County, CA        3810
## 92  2009       Riverside County, CA       31605
## 93  2009      Sacramento County, CA       20433
## 94  2009  San Bernardino County, CA       32006
## 95  2009       San Diego County, CA       44982
## 96  2009   San Francisco County, CA        8810
## 97  2009     San Joaquin County, CA       10876
## 98  2009 San Luis Obispo County, CA        2617
## 99  2009       San Mateo County, CA        9452
## 100 2009   Santa Barbara County, CA        6041
## 101 2009     Santa Clara County, CA       25203
## 102 2009      Santa Cruz County, CA        3299
## 103 2009          Shasta County, CA        2068
## 104 2009          Solano County, CA        5393
## 105 2009          Sonoma County, CA        5685
## 106 2009      Stanislaus County, CA        7942
## 107 2009          Tulare County, CA        8361
## 108 2009         Ventura County, CA       11360
## 109 2009            Yolo County, CA        2483
## 110 2009  Unidentified Counties, CA       10770
## 111 2009                                 527020
## 112 2010         Alameda County, CA       19306
## 113 2010           Butte County, CA        2457
## 114 2010    Contra Costa County, CA       12358
## 115 2010       El Dorado County, CA        1621
## 116 2010          Fresno County, CA       16283
## 117 2010        Humboldt County, CA        1551
## 118 2010        Imperial County, CA        3081
## 119 2010            Kern County, CA       14419
## 120 2010           Kings County, CA        2509
## 121 2010     Los Angeles County, CA      133252
## 122 2010          Madera County, CA        2434
## 123 2010           Marin County, CA        2371
## 124 2010          Merced County, CA        4249
## 125 2010        Monterey County, CA        6765
## 126 2010            Napa County, CA        1525
## 127 2010          Orange County, CA       38250
## 128 2010          Placer County, CA        3825
## 129 2010       Riverside County, CA       30670
## 130 2010      Sacramento County, CA       20056
## 131 2010  San Bernardino County, CA       31368
## 132 2010       San Diego County, CA       44867
## 133 2010   San Francisco County, CA        8806
## 134 2010     San Joaquin County, CA       10596
## 135 2010 San Luis Obispo County, CA        2735
## 136 2010       San Mateo County, CA        9194
## 137 2010   Santa Barbara County, CA        5821
## 138 2010     Santa Clara County, CA       23940
## 139 2010      Santa Cruz County, CA        3192
## 140 2010          Shasta County, CA        2136
## 141 2010          Solano County, CA        5050
## 142 2010          Sonoma County, CA        5393
## 143 2010      Stanislaus County, CA        7806
## 144 2010          Tulare County, CA        8155
## 145 2010         Ventura County, CA       11150
## 146 2010            Yolo County, CA        2427
## 147 2010  Unidentified Counties, CA       10580
## 148 2010                                 510198
## 149 2011         Alameda County, CA       19003
## 150 2011           Butte County, CA        2391
## 151 2011    Contra Costa County, CA       12060
## 152 2011       El Dorado County, CA        1630
## 153 2011          Fresno County, CA       16160
## 154 2011        Humboldt County, CA        1448
## 155 2011        Imperial County, CA        3079
## 156 2011            Kern County, CA       14287
## 157 2011           Kings County, CA        2567
## 158 2011     Los Angeles County, CA      130370
## 159 2011          Madera County, CA        2401
## 160 2011           Marin County, CA        2386
## 161 2011          Merced County, CA        4280
## 162 2011        Monterey County, CA        6812
## 163 2011            Napa County, CA        1572
## 164 2011          Orange County, CA       38101
## 165 2011          Placer County, CA        3834
## 166 2011       Riverside County, CA       30611
## 167 2011      Sacramento County, CA       20002
## 168 2011  San Bernardino County, CA       30566
## 169 2011       San Diego County, CA       43643
## 170 2011   San Francisco County, CA        8813
## 171 2011     San Joaquin County, CA       10329
## 172 2011 San Luis Obispo County, CA        2631
## 173 2011       San Mateo County, CA        9048
## 174 2011   Santa Barbara County, CA        5804
## 175 2011     Santa Clara County, CA       23649
## 176 2011      Santa Cruz County, CA        3233
## 177 2011          Shasta County, CA        2022
## 178 2011          Solano County, CA        5160
## 179 2011          Sonoma County, CA        5150
## 180 2011      Stanislaus County, CA        7738
## 181 2011          Tulare County, CA        7966
## 182 2011         Ventura County, CA       10656
## 183 2011            Yolo County, CA        2341
## 184 2011  Unidentified Counties, CA       10377
## 185 2011                                 502120
## 186 2012         Alameda County, CA       19546
## 187 2012           Butte County, CA        2399
## 188 2012    Contra Costa County, CA       12065
## 189 2012       El Dorado County, CA        1513
## 190 2012          Fresno County, CA       15955
## 191 2012        Humboldt County, CA        1504
## 192 2012        Imperial County, CA        3054
## 193 2012            Kern County, CA       14553
## 194 2012           Kings County, CA        2358
## 195 2012     Los Angeles County, CA      131664
## 196 2012          Madera County, CA        2257
## 197 2012           Marin County, CA        2305
## 198 2012          Merced County, CA        4312
## 199 2012        Monterey County, CA        6652
## 200 2012            Napa County, CA        1431
## 201 2012          Orange County, CA       38183
## 202 2012          Placer County, CA        3648
## 203 2012       Riverside County, CA       30300
## 204 2012      Sacramento County, CA       19623
## 205 2012  San Bernardino County, CA       30701
## 206 2012       San Diego County, CA       44396
## 207 2012   San Francisco County, CA        9075
## 208 2012     San Joaquin County, CA       10129
## 209 2012 San Luis Obispo County, CA        2580
## 210 2012       San Mateo County, CA        9185
## 211 2012   Santa Barbara County, CA        5585
## 212 2012     Santa Clara County, CA       24308
## 213 2012      Santa Cruz County, CA        3083
## 214 2012          Shasta County, CA        2109
## 215 2012          Solano County, CA        5062
## 216 2012          Sonoma County, CA        5143
## 217 2012      Stanislaus County, CA        7591
## 218 2012          Tulare County, CA        8000
## 219 2012         Ventura County, CA       10641
## 220 2012            Yolo County, CA        2451
## 221 2012  Unidentified Counties, CA       10394
## 222 2012                                 503755
## 223 2013         Alameda County, CA       19257
## 224 2013           Butte County, CA        2415
## 225 2013    Contra Costa County, CA       12154
## 226 2013       El Dorado County, CA        1534
## 227 2013          Fresno County, CA       15737
## 228 2013        Humboldt County, CA        1531
## 229 2013        Imperial County, CA        3094
## 230 2013            Kern County, CA       14149
## 231 2013           Kings County, CA        2394
## 232 2013     Los Angeles County, CA      128598
## 233 2013          Madera County, CA        2315
## 234 2013           Marin County, CA        2321
## 235 2013          Merced County, CA        4162
## 236 2013        Monterey County, CA        6547
## 237 2013            Napa County, CA        1450
## 238 2013          Orange County, CA       37281
## 239 2013          Placer County, CA        3688
## 240 2013       Riverside County, CA       29941
## 241 2013      Sacramento County, CA       19371
## 242 2013  San Bernardino County, CA       30246
## 243 2013       San Diego County, CA       43659
## 244 2013   San Francisco County, CA        8814
## 245 2013     San Joaquin County, CA        9800
## 246 2013 San Luis Obispo County, CA        2650
## 247 2013       San Mateo County, CA        8824
## 248 2013   Santa Barbara County, CA        5755
## 249 2013     Santa Clara County, CA       23313
## 250 2013      Santa Cruz County, CA        2871
## 251 2013          Shasta County, CA        2143
## 252 2013          Solano County, CA        5259
## 253 2013          Sonoma County, CA        4983
## 254 2013      Stanislaus County, CA        7579
## 255 2013          Tulare County, CA        7653
## 256 2013         Ventura County, CA       10446
## 257 2013            Yolo County, CA        2491
## 258 2013  Unidentified Counties, CA       10280
## 259 2013                                 494705
## 260 2014         Alameda County, CA       19650
## 261 2014           Butte County, CA        2481
## 262 2014    Contra Costa County, CA       12557
## 263 2014       El Dorado County, CA        1618
## 264 2014          Fresno County, CA       15762
## 265 2014        Humboldt County, CA        1468
## 266 2014        Imperial County, CA        3226
## 267 2014            Kern County, CA       14193
## 268 2014           Kings County, CA        2350
## 269 2014     Los Angeles County, CA      130289
## 270 2014          Madera County, CA        2313
## 271 2014           Marin County, CA        2401
## 272 2014          Merced County, CA        4164
## 273 2014        Monterey County, CA        6455
## 274 2014            Napa County, CA        1475
## 275 2014          Orange County, CA       38595
## 276 2014          Placer County, CA        3631
## 277 2014       Riverside County, CA       30235
## 278 2014      Sacramento County, CA       19871
## 279 2014  San Bernardino County, CA       31226
## 280 2014       San Diego County, CA       44596
## 281 2014   San Francisco County, CA        9104
## 282 2014     San Joaquin County, CA       10113
## 283 2014 San Luis Obispo County, CA        2596
## 284 2014       San Mateo County, CA        9083
## 285 2014   Santa Barbara County, CA        5830
## 286 2014     Santa Clara County, CA       23742
## 287 2014      Santa Cruz County, CA        3069
## 288 2014          Shasta County, CA        2083
## 289 2014          Solano County, CA        5253
## 290 2014          Sonoma County, CA        5070
## 291 2014      Stanislaus County, CA        7511
## 292 2014          Tulare County, CA        7640
## 293 2014         Ventura County, CA       10468
## 294 2014            Yolo County, CA        2394
## 295 2014  Unidentified Counties, CA       10367
## 296 2014                                 502879
## 297 2015         Alameda County, CA       19434
## 298 2015           Butte County, CA        2442
## 299 2015    Contra Costa County, CA       12596
## 300 2015       El Dorado County, CA        1594
## 301 2015          Fresno County, CA       15359
## 302 2015        Humboldt County, CA        1441
## 303 2015        Imperial County, CA        3168
## 304 2015            Kern County, CA       13768
## 305 2015           Kings County, CA        2274
## 306 2015     Los Angeles County, CA      124641
## 307 2015          Madera County, CA        2225
## 308 2015           Marin County, CA        2288
## 309 2015          Merced County, CA        4104
## 310 2015        Monterey County, CA        6420
## 311 2015            Napa County, CA        1457
## 312 2015          Orange County, CA       37609
## 313 2015          Placer County, CA        3747
## 314 2015       Riverside County, CA       30491
## 315 2015      Sacramento County, CA       19423
## 316 2015  San Bernardino County, CA       30530
## 317 2015       San Diego County, CA       43942
## 318 2015   San Francisco County, CA        8972
## 319 2015     San Joaquin County, CA        9983
## 320 2015 San Luis Obispo County, CA        2668
## 321 2015       San Mateo County, CA        9037
## 322 2015   Santa Barbara County, CA        5673
## 323 2015     Santa Clara County, CA       23393
## 324 2015      Santa Cruz County, CA        2840
## 325 2015          Shasta County, CA        2073
## 326 2015          Solano County, CA        5131
## 327 2015          Sonoma County, CA        5015
## 328 2015      Stanislaus County, CA        7698
## 329 2015          Tulare County, CA        7411
## 330 2015         Ventura County, CA       10060
## 331 2015            Yolo County, CA        2402
## 332 2015  Unidentified Counties, CA       10439
## 333 2015                                 491748
## 334 2016         Alameda County, CA       19573
## 335 2016           Butte County, CA        2490
## 336 2016    Contra Costa County, CA       12340
## 337 2016       El Dorado County, CA        1601
## 338 2016          Fresno County, CA       15129
## 339 2016        Humboldt County, CA        1482
## 340 2016        Imperial County, CA        2939
## 341 2016            Kern County, CA       13728
## 342 2016           Kings County, CA        2248
## 343 2016     Los Angeles County, CA      123092
## 344 2016          Madera County, CA        2355
## 345 2016           Marin County, CA        2252
## 346 2016          Merced County, CA        4117
## 347 2016        Monterey County, CA        6219
## 348 2016            Napa County, CA        1406
## 349 2016          Orange County, CA       38106
## 350 2016          Placer County, CA        3732
## 351 2016       Riverside County, CA       30661
## 352 2016      Sacramento County, CA       19588
## 353 2016  San Bernardino County, CA       31032
## 354 2016       San Diego County, CA       42720
## 355 2016   San Francisco County, CA        9062
## 356 2016     San Joaquin County, CA       10268
## 357 2016 San Luis Obispo County, CA        2581
## 358 2016       San Mateo County, CA        8960
## 359 2016   Santa Barbara County, CA        5501
## 360 2016     Santa Clara County, CA       23042
## 361 2016      Santa Cruz County, CA        2799
## 362 2016          Shasta County, CA        2048
## 363 2016          Solano County, CA        5259
## 364 2016          Sonoma County, CA        4962
## 365 2016      Stanislaus County, CA        7862
## 366 2016          Tulare County, CA        7146
## 367 2016         Ventura County, CA        9592
## 368 2016            Yolo County, CA        2423
## 369 2016  Unidentified Counties, CA       10512
## 370 2016                                 488827
## 371 2017         Alameda County, CA       18888
## 372 2017           Butte County, CA        2386
## 373 2017    Contra Costa County, CA       12180
## 374 2017       El Dorado County, CA        1570
## 375 2017          Fresno County, CA       14541
## 376 2017        Humboldt County, CA        1372
## 377 2017        Imperial County, CA        2894
## 378 2017            Kern County, CA       13326
## 379 2017           Kings County, CA        2373
## 380 2017     Los Angeles County, CA      116950
## 381 2017          Madera County, CA        2120
## 382 2017           Marin County, CA        2237
## 383 2017          Merced County, CA        4202
## 384 2017        Monterey County, CA        5810
## 385 2017            Napa County, CA        1291
## 386 2017          Orange County, CA       37369
## 387 2017          Placer County, CA        3689
## 388 2017       Riverside County, CA       29857
## 389 2017      Sacramento County, CA       19202
## 390 2017  San Bernardino County, CA       29643
## 391 2017       San Diego County, CA       41230
## 392 2017   San Francisco County, CA        8947
## 393 2017     San Joaquin County, CA        9928
## 394 2017 San Luis Obispo County, CA        2550
## 395 2017       San Mateo County, CA        8585
## 396 2017   Santa Barbara County, CA        5531
## 397 2017     Santa Clara County, CA       22133
## 398 2017      Santa Cruz County, CA        2658
## 399 2017          Shasta County, CA        2008
## 400 2017          Solano County, CA        5131
## 401 2017          Sonoma County, CA        4642
## 402 2017      Stanislaus County, CA        7441
## 403 2017          Tulare County, CA        7130
## 404 2017         Ventura County, CA        9318
## 405 2017            Yolo County, CA        2272
## 406 2017  Unidentified Counties, CA       10254
## 407 2017                                 471658
## 408 2018         Alameda County, CA       18240
## 409 2018           Butte County, CA        2430
## 410 2018    Contra Costa County, CA       12002
## 411 2018       El Dorado County, CA        1674
## 412 2018          Fresno County, CA       14465
## 413 2018        Humboldt County, CA        1364
## 414 2018        Imperial County, CA        2629
## 415 2018            Kern County, CA       12916
## 416 2018           Kings County, CA        2262
## 417 2018     Los Angeles County, CA      110271
## 418 2018          Madera County, CA        2079
## 419 2018           Marin County, CA        2127
## 420 2018          Merced County, CA        3875
## 421 2018        Monterey County, CA        5895
## 422 2018            Napa County, CA        1204
## 423 2018          Orange County, CA       35679
## 424 2018          Placer County, CA        3663
## 425 2018       Riverside County, CA       28725
## 426 2018      Sacramento County, CA       19102
## 427 2018  San Bernardino County, CA       28994
## 428 2018       San Diego County, CA       40070
## 429 2018   San Francisco County, CA        8697
## 430 2018     San Joaquin County, CA        9841
## 431 2018 San Luis Obispo County, CA        2445
## 432 2018       San Mateo County, CA        8330
## 433 2018   Santa Barbara County, CA        5268
## 434 2018     Santa Clara County, CA       21292
## 435 2018      Santa Cruz County, CA        2449
## 436 2018          Shasta County, CA        1966
## 437 2018          Solano County, CA        5033
## 438 2018          Sonoma County, CA        4526
## 439 2018      Stanislaus County, CA        7364
## 440 2018          Tulare County, CA        6905
## 441 2018         Ventura County, CA        9065
## 442 2018            Yolo County, CA        2135
## 443 2018  Unidentified Counties, CA        9938
## 444 2018                                 454920
## 445 2019         Alameda County, CA       18212
## 446 2019           Butte County, CA        2154
## 447 2019    Contra Costa County, CA       11729
## 448 2019       El Dorado County, CA        1524
## 449 2019          Fresno County, CA       14057
## 450 2019        Humboldt County, CA        1417
## 451 2019        Imperial County, CA        2533
## 452 2019            Kern County, CA       12765
## 453 2019           Kings County, CA        2115
## 454 2019     Los Angeles County, CA      107231
## 455 2019          Madera County, CA        2045
## 456 2019           Marin County, CA        2071
## 457 2019          Merced County, CA        3806
## 458 2019        Monterey County, CA        5846
## 459 2019            Napa County, CA        1279
## 460 2019          Orange County, CA       35052
## 461 2019          Placer County, CA        3658
## 462 2019       Riverside County, CA       28026
## 463 2019      Sacramento County, CA       19089
## 464 2019  San Bernardino County, CA       28656
## 465 2019       San Diego County, CA       38540
## 466 2019   San Francisco County, CA        8438
## 467 2019     San Joaquin County, CA       10009
## 468 2019 San Luis Obispo County, CA        2447
## 469 2019       San Mateo County, CA        8206
## 470 2019   Santa Barbara County, CA        5537
## 471 2019     Santa Clara County, CA       21184
## 472 2019      Santa Cruz County, CA        2434
## 473 2019          Shasta County, CA        1903
## 474 2019          Solano County, CA        5065
## 475 2019          Sonoma County, CA        4395
## 476 2019      Stanislaus County, CA        7402
## 477 2019          Tulare County, CA        6714
## 478 2019         Ventura County, CA        8736
## 479 2019            Yolo County, CA        2057
## 480 2019  Unidentified Counties, CA       10147
## 481 2019                                 446479
## 482   NA                                6512502
## 483   NA                                     NA
## 484   NA                                     NA
## 485   NA                                     NA
## 486   NA                                     NA
## 487   NA                                     NA
## 488   NA                                     NA
## 489   NA                                     NA
## 490   NA                                     NA
## 491   NA                                     NA
## 492   NA                                     NA
## 493   NA                                     NA
## 494   NA                                     NA
## 495   NA                                     NA
## 496   NA                                     NA
## 497   NA                                     NA
## 498   NA                                     NA
## 499   NA                                     NA
## 500   NA                                     NA
## 501   NA                                     NA
## 502   NA                                     NA
## 503   NA                                     NA
## 504   NA                                     NA
## 505   NA                                     NA
## 506   NA                                     NA
## 507   NA                                     NA
## 508   NA                                     NA
## 509   NA                                     NA
## 510   NA                                     NA
## 511   NA                                     NA
## 512   NA                                     NA
## 513   NA                                     NA
## 514   NA                                     NA
## 515   NA                                     NA
## 516   NA                                     NA
## 517   NA                                     NA
## 518   NA                                     NA
## 519   NA                                     NA
## 520   NA                                     NA
## 521   NA                                     NA
## 522   NA                                     NA
## 523   NA                                     NA
#I noticed that the data I downloaded did not include total # of births so merging two datasets (one that has total # of birth counts and the other with low birth wegiht +very low birth weight counts)
df1 <- full_join(cdc_lowbirthweight , MCH.CDC.Data.Total, by=c("Year", "County"))
df1<- df1 %>% rename("cases" = "Births.x", "total_births" = "Births.y")

#Note: LBW = Low birth weight + Very low birth weight counts; Total Births = Total # of Birth
col_order <- c("Year", "County", "total_births",
               "cases", "Average.Birth.Weight", "Standard.Deviation.for.Average.Birth.Weight",
               "Average.Age.of.Mother", "Standard.Deviation.for.Average.Age.of.Mother","Average.LMP.Gestational.Age",
               "Standard.Deviation.for.Average.LMP.Gestational.Age")
df2 <- df1[,col_order]
df2[df2$County == "Alameda County, CA", "County"] <-"alameda"
df2[df2$County == "Butte County, CA", "County"] <-"butte"
df2[df2$County == "Contra Costa County, CA", "County"] <-"contra costa"
df2[df2$County == "El Dorado County, CA", "County"] <-"el dorado"
df2[df2$County == "Fresno County, CA", "County"] <-"fresno"
df2[df2$County == "Humboldt County, CA", "County"] <-"humboldt"
df2[df2$County == "Imperial County, CA", "County"] <-"imperial"
df2[df2$County == "Kern County, CA", "County"] <-"kern"
df2[df2$County == "Kings County, CA", "County"] <-"kings"
df2[df2$County == "Los Angeles County, CA", "County"] <-"los angeles"
df2[df2$County == "Madera County, CA", "County"] <-"madera"
df2[df2$County == "Marin County, CA", "County"] <-"marin"
df2[df2$County == "Contra Costa County, CA", "County"] <-"mariposa"
df2[df2$County == "Merced County, CA", "County"] <-"merced"
df2[df2$County == "Monterey County, CA", "County"] <-"monterey"
df2[df2$County == "Napa County, CA", "County"] <-"napa"
df2[df2$County == "Orange County, CA", "County"] <-"orange"
df2[df2$County == "Placer County, CA", "County"] <-"placer"
df2[df2$County == "Riverside County, CA", "County"] <-"riverside"
df2[df2$County == "Sacramento County, CA", "County"] <-"sacramento"
df2[df2$County == "San Bernardino County, CA", "County"] <-"san bernardino"
df2[df2$County == "San Diego County, CA", "County"] <-"san diego"
df2[df2$County == "San Francisco County, CA", "County"] <-"san francisco"
df2[df2$County == "San Joaquin County, CA", "County"] <-"san joaquin"
df2[df2$County == "San Luis Obispo County, CA", "County"] <-"san luis obispo"
df2[df2$County == "San Mateo County, CA", "County"] <-"san mateo"
df2[df2$County == "Santa Barbara County, CA", "County"] <-"santa barbara"
df2[df2$County == "Santa Clara County, CA", "County"] <-"santa clara"
df2[df2$County == "Santa Cruz County, CA", "County"] <-"santa cruz"
df2[df2$County == "Shasta County, CA", "County"] <-"shasta"
df2[df2$County == "Solano County, CA", "County"] <-"solano"
df2[df2$County == "Sonoma County, CA", "County"] <-"sonoma"
df2[df2$County == "Stanislaus County, CA", "County"] <-"stanislaus"
df2[df2$County == "Tulare County, CA", "County"] <-"tulare"
df2[df2$County == "Ventura County, CA", "County"] <-"ventura"
df2[df2$County == "Yolo County, CA", "County"] <-"yolo"
df2 <- df2 %>% filter(!is.na(total_births)) %>% filter(!is.na(cases)) %>% mutate(rate = cases/total_births * 10^2)
df2$County <- df2$County %>% str_to_title()
Preterm Birth Data from CDC Wonder Database (Wrangling for use in ShinyApp)::

This is my data wrangling process for preterm birth for the CDC WONDER database. By default, CDC WONDER live birth database only displayed counties that had a county population >100,000. I only looked at preterm birth here and this is for my shiny app bar graph.

cdc_pretermbirth <- read.delim("Preterm birth.txt",  sep ="\t", dec=".", header = TRUE, stringsAsFactors = FALSE)
cdc_pretermbirth  <- cdc_pretermbirth [-c(422:472), ]
cdc_pretermbirth  <- cdc_pretermbirth [ ,-c(1, 3, 5)]
cdc_pretermbirth <- cdc_pretermbirth %>% rename("Events" = "Births")
MCH.CDC.Data.Total <- read.delim("MCH CDC Data Total.txt",  sep ="\t", dec=".", header = TRUE, stringsAsFactors = FALSE)
MCH.CDC.Data.Total <- MCH.CDC.Data.Total[,-c(1, 3, 5)]
MCH.CDC.Data.Total <- MCH.CDC.Data.Total %>% rename("total_birth" = "Births")

df1_pt <- full_join(cdc_pretermbirth , MCH.CDC.Data.Total, by=c("Year", "County"))

df1_pt[df1_pt$County == "Alameda County, CA", "County"] <-"alameda"
df1_pt[df1_pt$County == "Butte County, CA", "County"] <-"butte"
df1_pt[df1_pt$County == "Contra Costa County, CA", "County"] <-"contra costa"
df1_pt[df1_pt$County == "El Dorado County, CA", "County"] <-"el dorado"
df1_pt[df1_pt$County == "Fresno County, CA", "County"] <-"fresno"
df1_pt[df1_pt$County == "Humboldt County, CA", "County"] <-"humboldt"
df1_pt[df1_pt$County == "Imperial County, CA", "County"] <-"imperial"
df1_pt[df1_pt$County == "Kern County, CA", "County"] <-"kern"
df1_pt[df1_pt$County == "Kings County, CA", "County"] <-"kings"
df1_pt[df1_pt$County == "Los Angeles County, CA", "County"] <-"los angeles"
df1_pt[df1_pt$County == "Madera County, CA", "County"] <-"madera"
df1_pt[df1_pt$County == "Marin County, CA", "County"] <-"marin"
df1_pt[df1_pt$County == "Contra Costa County, CA", "County"] <-"mariposa"
df1_pt[df1_pt$County == "Merced County, CA", "County"] <-"merced"
df1_pt[df1_pt$County == "Monterey County, CA", "County"] <-"monterey"
df1_pt[df1_pt$County == "Napa County, CA", "County"] <-"napa"
df1_pt[df1_pt$County == "Orange County, CA", "County"] <-"orange"
df1_pt[df1_pt$County == "Placer County, CA", "County"] <-"placer"
df1_pt[df1_pt$County == "Riverside County, CA", "County"] <-"riverside"
df1_pt[df1_pt$County == "Sacramento County, CA", "County"] <-"sacramento"
df1_pt[df1_pt$County == "San Bernardino County, CA", "County"] <-"san bernardino"
df1_pt[df1_pt$County == "San Diego County, CA", "County"] <-"san diego"
df1_pt[df1_pt$County == "San Francisco County, CA", "County"] <-"san francisco"
df1_pt[df1_pt$County == "San Joaquin County, CA", "County"] <-"san joaquin"
df1_pt[df1_pt$County == "San Luis Obispo County, CA", "County"] <-"san luis obispo"
df1_pt[df1_pt$County == "San Mateo County, CA", "County"] <-"san mateo"
df1_pt[df1_pt$County == "Santa Barbara County, CA", "County"] <-"santa barbara"
df1_pt[df1_pt$County == "Santa Clara County, CA", "County"] <-"santa clara"
df1_pt[df1_pt$County == "Santa Cruz County, CA", "County"] <-"santa cruz"
df1_pt[df1_pt$County == "Shasta County, CA", "County"] <-"shasta"
df1_pt[df1_pt$County == "Solano County, CA", "County"] <-"solano"
df1_pt[df1_pt$County == "Sonoma County, CA", "County"] <-"sonoma"
df1_pt[df1_pt$County == "Stanislaus County, CA", "County"] <-"stanislaus"
df1_pt[df1_pt$County == "Tulare County, CA", "County"] <-"tulare"
df1_pt[df1_pt$County == "Ventura County, CA", "County"] <-"ventura"
df1_pt[df1_pt$County == "Yolo County, CA", "County"] <-"yolo"
df1_pt <- df1_pt %>% mutate(County = str_to_title(County))
df1_pt <- df1_pt %>% filter(!is.na("total_birth")) %>% filter(!is.na(Events)) %>% mutate(rate = Events/total_birth * 10^2)

Data Joining of CDC WONDER Live Birth Data (Low Birth Weight and Pre-Term Birth) and Pesticide Data

I then joined the CDC WONDER data (low birth weight and preterm birth) and Zainab’s wrangled pesticide data to come up with a joint data. I then generated bar graphs to visualize the trend across a span of 2007-2016 (Please see shiny app). We noticed that Fresno and Kern county were the two top counties that used the highest amounts of pesticide and found out that San Joaquin Valley is a region that’s agriculturally productive.

county_ranks16 <- read_delim("table1_county_rank_2016.txt", "\t", escape_double = FALSE, trim_ws = TRUE)
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   COUNTY = col_character(),
##   LBS_2015 = col_double(),
##   RANK_2015 = col_double(),
##   LBS_2016 = col_double(),
##   RANK_2016 = col_double()
## )
repro_lbs16 <- read_delim("table3_reproductive_lbs_2016.txt", "\t", escape_double = FALSE, trim_ws = TRUE)
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   CHEMICAL = col_character(),
##   LBS_2007 = col_double(),
##   LBS_2008 = col_double(),
##   LBS_2009 = col_double(),
##   LBS_2010 = col_double(),
##   LBS_2011 = col_double(),
##   LBS_2012 = col_double(),
##   LBS_2013 = col_double(),
##   LBS_2014 = col_double(),
##   LBS_2015 = col_double(),
##   LBS_2016 = col_double()
## )
repro_acre16 <- read_delim("table4_reproductive_acres_2016.txt", "\t", escape_double = FALSE, trim_ws = TRUE)
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   CHEMNAME = col_character(),
##   ACRES_2007 = col_double(),
##   ACRES_2008 = col_double(),
##   ACRES_2009 = col_double(),
##   ACRES_2010 = col_double(),
##   ACRES_2011 = col_double(),
##   ACRES_2012 = col_double(),
##   ACRES_2013 = col_double(),
##   ACRES_2014 = col_double(),
##   ACRES_2015 = col_double(),
##   ACRES_2016 = col_double()
## )
table1_2016 <- county_ranks16 %>% transmute(county = COUNTY, 
                                            lbs_2015 = LBS_2015, rank_2015 = RANK_2015, 
                                            lbs_2016 = LBS_2016, rank_2016 = RANK_2016)

all_dat <- list(read_csv("table1_2007.csv")[1:3],
                read_csv("table1_2008.csv")[1:3],
                read_csv("table1_2009.csv")[1:3],
                read_csv("table1_2010.csv")[1:3],
                read_csv("table1_2011.csv")[1:3],
                read_csv("table1_2012.csv")[1:3],
                read_csv("table1_2013.csv")[1:3],
                read_csv("table1_2014.csv")[1:3],
                read_csv("table1_2015.csv")[1:3])
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   county = col_character(),
##   lbs_2006 = col_double(),
##   rank_2006 = col_double(),
##   lbs_2007 = col_double(),
##   rank_2007 = col_double()
## )
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   county = col_character(),
##   lbs_2007 = col_double(),
##   rank_2007 = col_double(),
##   lbs_2008 = col_double(),
##   rank_2008 = col_double()
## )
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   county = col_character(),
##   lbs_2008 = col_double(),
##   rank_2008 = col_double(),
##   lbs_2009 = col_double(),
##   rank_2009 = col_double()
## )
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   county = col_character(),
##   lbs_2009 = col_double(),
##   rank_2009 = col_double(),
##   lbs_2010 = col_double(),
##   rank_2010 = col_double()
## )
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   county = col_character(),
##   lbs_2010 = col_double(),
##   rank_2010 = col_double(),
##   lbs_2011 = col_double(),
##   rank_2011 = col_double()
## )
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   county = col_character(),
##   lbs_2011 = col_double(),
##   rank_2011 = col_double(),
##   lbs_2012 = col_double(),
##   rank_2012 = col_double()
## )
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   county = col_character(),
##   lbs_2012 = col_double(),
##   rank_2012 = col_double(),
##   lbs_2013 = col_double(),
##   rank_2013 = col_double()
## )
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   county = col_character(),
##   lbs_2013 = col_double(),
##   rank_2013 = col_double(),
##   lbs_2014 = col_double(),
##   rank_2014 = col_double()
## )
## 
## -- Column specification ----------------------------------------------------------------------------------------
## cols(
##   county = col_character(),
##   lbs_2014 = col_double(),
##   rank_2014 = col_double(),
##   lbs_2015 = col_double(),
##   rank_2015 = col_double()
## )
table1 <- Reduce(function(x, y) left_join(x, y, by = "county"), all_dat)

long_table1 <- table1 %>% pivot_longer(!county, names_to = 'usage', values_to = "value")
table1_ranks <- long_table1 %>% filter(str_starts(usage, "rank"))
table1_lbs <- long_table1 %>% filter(str_starts(usage, "lbs"))
table1_lbs$usage <- as.numeric(gsub("[^[:digit:]]+", "", table1_lbs$usage))

long_table2 <- table1_2016 %>% pivot_longer(!county, names_to = 'usage', values_to = "value")
table1_ranks_1516 <- long_table2 %>% filter(str_starts(usage, "rank"))
table1_lbs_1516<- long_table2 %>% filter(str_starts(usage, "lbs"))


table1_lbs_1516$usage <- as.numeric(gsub("[^[:digit:]]+", "", table1_lbs_1516$usage))
combined_pesticide_use <- table1_lbs %>% full_join(table1_lbs_1516) 
## Joining, by = c("county", "usage", "value")
combined_pesticide_use <- combined_pesticide_use %>% group_by(usage) 
combined_pesticide_use <- combined_pesticide_use %>% arrange(usage)

averagebw <-df2 %>% select("County", "Year", "rate")
pesticide_averagebw_join <- averagebw %>% inner_join(combined_pesticide_use, by = c("County" = "county", "Year" = "usage")) 

averagept <-df1_pt %>% select("County", "Year", "rate")
pesticide_averagept_join <- averagept %>% inner_join(combined_pesticide_use, by = c("County" = "county", "Year" = "usage")) 

#bar graph of low birth weight
pesticide_averagebw_join %>% ggplot(aes(County, rate)) + geom_col() + ylab("Low Birth Weight Rate (%)") +xlab("") +
                theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 1/2)) 

#bar graph of pesticide 
pesticide_averagept_join %>% ggplot(aes(County, value)) + geom_col() + ylab("Pesticide Use (Pounds)") +xlab("") + 
            theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 1/2))

Code for Creating the Map - Leaflet Map Using Preterm Birth Data I Wrangled Earlier (See Shiny App for the Final Result)

This is a similar spatial map but for pesticide use. Following is the wrangling process of generating a map that shapes the map of California, and merging that spatial object with pesticide use that I wrangled earlier to generate a leaflet map.

averagept_df <- combined_pesticide_use %>% filter(usage == "2016")

map <- readOGR(path.expand("cb_2018_us_county_20m.shp"),
               layer = "cb_2018_us_county_20m", stringsAsFactors = FALSE)
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\zay-z\Documents\Harvard Chan\Fall 2020\BST260\datascience-project\Data Prep (& Final RMD)\cb_2018_us_county_20m.shp", layer: "cb_2018_us_county_20m"
## with 3220 features
## It has 9 fields
## Integer64 fields read as strings:  ALAND AWATER
Statekey<-read.csv('./STATEFPtoSTATENAME_Key.csv', colClasses=c('character'))
map<-merge(x=map, y=Statekey, by="STATEFP", all=TRUE)
SingleState <- subset(map, map$STATENAME %in% c(
    "California"
))

spatial_pesticide <-sp::merge(x=SingleState, y=averagept_df, by.x="NAME", by.y="county", by=x)

binpes <- c(200, 100145, 1131454, 3345277, Inf)
pal3 <- colorBin(
    palette = "magma",
    domain = spatial_pesticide$value, n=7, bins=binpes)

leaflet(spatial_pesticide, options = leafletOptions(zoomControl = TRUE, zoomLevelFixed = FALSE, dragging=TRUE, minZoom = 5.3, maxZoom = 9)) %>% 
                setView(lat = 36.778259, lng = -119.417931, zoom = 6) %>%
                addPolygons(color = "Black", weight = 1, smoothFactor = 0.5, 
                            opacity = 1.0, fillOpacity = 0.5, layerId = ~NAME,
                            fillColor = ~pal3(value), 
                            popup = ~as.factor(paste0("<b><font size=\"4\"><center>County: </b>",spatial_pesticide$NAME,"</font></center>","Amounts of Pesticides used </b>", sprintf("%1.2f", spatial_pesticide$value),"<br/>"))) %>%
                addLegend(pal = pal3, values = spatial_pesticide$value, opacity = 1, title="Amounts of Pesticide Used (Pounds)")

Regression Analysis

#Top 10 Counties in term of pesticide usage
agro <- c("Kern", "Tulare", "Fresno", "Monterey", "Merced", "Stanislaus", 
          "San Joaquin", "Ventura", "Imperial", "Kings")

mch_regression <- MCH.CDC.Data_Race %>% 
  filter(Year == 2016) %>%
  mutate(agricultural = ifelse(County %in% agro, 1, 0))
linmod <- lm(Average.Birth.Weight ~ Average.LMP.Gestational.Age + factor(agricultural) + Average.LMP.Gestational.Age * factor(agricultural), mch_regression)
summary(linmod)[4]
## $coefficients
##                                                     Estimate Std. Error
## (Intercept)                                       -6880.0613  995.26975
## Average.LMP.Gestational.Age                         262.4304   25.70411
## factor(agricultural)1                              5256.6615 1888.23917
## Average.LMP.Gestational.Age:factor(agricultural)1  -136.4639   48.80666
##                                                     t value     Pr(>|t|)
## (Intercept)                                       -6.912760 1.795659e-10
## Average.LMP.Gestational.Age                       10.209666 1.996331e-18
## factor(agricultural)1                              2.783896 6.154339e-03
## Average.LMP.Gestational.Age:factor(agricultural)1 -2.796011 5.940801e-03
summary(linmod)[9]
## $adj.r.squared
## [1] 0.4615328
mch_regression %>%
  ggplot(aes(Average.LMP.Gestational.Age, Average.Birth.Weight, color = factor(agricultural))) + 
  geom_point() + 
  geom_line(aes(y = predict(linmod)))  + 
  xlab("Average LMP Gestational Age (weeks)") + 
  ylab("Average Birth Weight (grams)") + 
  scale_color_discrete(name = "Top Ten\nAgriculture\nCounty", labels = c('No', "Yes")) +
  ggtitle("Birth Weight Outcomes by Agriculture Category") + 
  ylim(2980, 3520) +
  xlim(37.6, 39.6)

Pesticide use did not appear to affect much, but race did.

#parallel lines, Black and Asian/Pacific Island populations fare the worst
mch_regression %>%
  ggplot(aes(Average.LMP.Gestational.Age, Average.Birth.Weight, color = Mothers.Race)) + 
  geom_point() +  
  xlab("Average LMP Gestational Age (weeks)") + 
  ylab("Average Birth Weight (grams)") + 
  scale_color_discrete(name = "Mother's Race") +
  ggtitle("Birth Weight Outcomes by Race") + 
  ylim(2980, 3520) +
  xlim(37.6, 39.6)

Pesticide use did not appear to affect much, but race did. So, we stratified by race.

#simple linear model more parsimonious than the one that has the interaction term for American Indian/Alaska Native Mothers
#tho there are low populations for this group, so the numbers in the data may have a lot of variability among different years
amerindian_mod <- lm(Average.Birth.Weight ~ Average.LMP.Gestational.Age, filter(mch_regression, Mothers.Race == "American Indian or Alaska Native"))
summary(amerindian_mod)[4] #coefficients
## $coefficients
##                               Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)                 -3829.7135 1246.73649 -3.071791 4.495686e-03
## Average.LMP.Gestational.Age   184.9774   32.20391  5.743941 2.857815e-06
summary(amerindian_mod)[9] #adjusted r-squared
## $adj.r.squared
## [1] 0.5078807
q1 <- mch_regression %>%
  filter(Mothers.Race == "American Indian or Alaska Native") %>%
  ggplot(aes(Average.LMP.Gestational.Age, Average.Birth.Weight, color = factor(agricultural))) + 
  geom_point() + geom_line(aes(y = predict(amerindian_mod)), size = 1) + 
  xlab("Average LMP Gestational Age (weeks)") + 
  ylab("Average Birth Weight (grams)") + 
  scale_color_discrete(name = "Top Ten\nAgriculture\nCounty", labels = c('No', "Yes")) +
  ggtitle("Birth Weight Outcome for American Indian and Alaska Native Mothers") + 
  ylim(2980, 3520) +
  xlim(37.6, 39.6)
#even simple linear regression doesn't explain a lot of the errors for Asian mothers
asian_mod <- lm(Average.Birth.Weight ~ Average.LMP.Gestational.Age, filter(mch_regression, Mothers.Race == "Asian or Pacific Islander"))
summary(asian_mod)[4] #coefficients
## $coefficients
##                               Estimate Std. Error    t value   Pr(>|t|)
## (Intercept)                 -429.60128 1658.17961 -0.2590801 0.79718280
## Average.LMP.Gestational.Age   94.23017   42.87789  2.1976402 0.03509986
summary(asian_mod)[9] #adjusted r-squared
## $adj.r.squared
## [1] 0.1012334
q2 <- mch_regression %>%
  filter(Mothers.Race == "Asian or Pacific Islander") %>%
  ggplot(aes(Average.LMP.Gestational.Age, Average.Birth.Weight, color = factor(agricultural))) + 
  geom_point() + 
  geom_line(aes(y = predict(asian_mod)), size = 1) + 
  xlab("Average LMP Gestational Age (weeks)") + 
  ylab("Average Birth Weight (grams)") + 
  scale_color_discrete(name = "Top Ten\nAgriculture\nCounty", labels = c('No', "Yes")) +
  ggtitle("Birth Weight Outcome for Asian and Pacific Islander Mothers") + 
  ylim(2980, 3520) +
  xlim(37.6, 39.6)
black_mod <- lm(Average.Birth.Weight ~ Average.LMP.Gestational.Age + factor(agricultural) + Average.LMP.Gestational.Age*factor(agricultural), filter(mch_regression, Mothers.Race == "Black or African American"))
summary(black_mod)[4] #coefficients
## $coefficients
##                                                     Estimate Std. Error
## (Intercept)                                       -5672.4322 1361.94810
## Average.LMP.Gestational.Age                         230.1197   35.26328
## factor(agricultural)1                              5451.8961 2461.08502
## Average.LMP.Gestational.Age:factor(agricultural)1  -142.4039   63.79834
##                                                     t value     Pr(>|t|)
## (Intercept)                                       -4.164940 2.304953e-04
## Average.LMP.Gestational.Age                        6.525760 2.774693e-07
## factor(agricultural)1                              2.215241 3.422210e-02
## Average.LMP.Gestational.Age:factor(agricultural)1 -2.232094 3.297249e-02
summary(black_mod)[9] #adjusted r-squared
## $adj.r.squared
## [1] 0.585938
q3 <- mch_regression %>%
  filter(Mothers.Race == "Black or African American") %>%
  ggplot(aes(Average.LMP.Gestational.Age, Average.Birth.Weight, color = factor(agricultural))) + 
  geom_point() + geom_line(aes(y = predict(black_mod)), size = 1) + 
  xlab("Average LMP Gestational Age (weeks)") + 
  ylab("Average Birth Weight (grams)") + 
scale_color_discrete(name = "Top Ten\nAgriculture\nCounty", labels = c('No', "Yes")) +
  ggtitle("Birth Weight Outcome for Black Mothers") +
  ylim(2980, 3520) +
  xlim(37.6, 39.6)
white_mod <- lm(Average.Birth.Weight ~ Average.LMP.Gestational.Age + factor(agricultural) + Average.LMP.Gestational.Age*factor(agricultural), filter(mch_regression, Mothers.Race == "White"))
summary(white_mod)[4] #coefficients
## $coefficients
##                                                     Estimate Std. Error
## (Intercept)                                       -5250.8344 1098.58067
## Average.LMP.Gestational.Age                         221.3943   28.26190
## factor(agricultural)1                              6590.0755 2589.80533
## Average.LMP.Gestational.Age:factor(agricultural)1  -170.1409   66.77334
##                                                     t value     Pr(>|t|)
## (Intercept)                                       -4.779653 4.035348e-05
## Average.LMP.Gestational.Age                        7.833668 7.687165e-09
## factor(agricultural)1                              2.544622 1.613772e-02
## Average.LMP.Gestational.Age:factor(agricultural)1 -2.548036 1.600829e-02
summary(white_mod)[9] #adjusted r-squared
## $adj.r.squared
## [1] 0.6877013
q4 <- mch_regression %>%
  filter(Mothers.Race == "White" ) %>%
  ggplot(aes(Average.LMP.Gestational.Age, Average.Birth.Weight, color = factor(agricultural))) + 
  geom_point() + geom_line(aes(y = predict(white_mod)), size = 1) + 
  xlab("Average LMP Gestational Age (weeks)") + 
  ylab("Average Birth Weight (grams)") + 
scale_color_discrete(name = "Top Ten\nAgriculture\nCounty", labels = c('No', "Yes")) +
  ggtitle("Birth Weight Outcome for White Mothers") +
  ylim(2980, 3520) +
  xlim(37.6, 39.6)
#Imperial appears to be an influential point in the plot for Asian, Black, and White mothers
q1

q2

q3

q4

Checking Assumptions for Linear Models by Checkign Residuals

#LINE assumptions met, Black mothers in Imperial appear to be influential
plot(linmod)

# residuals definitely skewed, Imperial showing up again
plot(asian_mod)

plot(amerindian_mod)

# Imperial
plot(black_mod)

# Imperial
plot(white_mod)